Getting to Grips with Git
Why Git?
You will need two things to get started (you should have already done this if following the class in sequence):
gitinstalled on your computer: git-scm.com/downloadsA GitHub account: github.com
Version control has been around longer than the Internet. Broadly, it was designed to achieve three things:
- A record of any and all edits made to a file containing code.
- A means of allowing developers to share edits with one another.
- A way of reconciling conflicts when two (or more) developers edited the same code.
> You can use Git anywhere you have files that are changing (hint: essays, notes, projects, assessments, code…) and need/want to track them.
Bonus: you also gain free backups if some part of your version control system is on a different computer!
How It Works
The natural way normal people think about managing versions of a document is to save a copy with a new name that somehow shows which version is most recent.
The natural way developers used to think about managing versions of a document is to have a master copy somewhere. Everyone asks the server for the master copy, makes some changes, and then checks those changes back in.
This is not how Git works.
The way normal people approach this problem assumes that, usually, only one or two people are making changes. But how do you coordinate with 20 other people to find out who has the most recent copy then collect all 21 people’s changes?
The way developers used to approach this problem assumes that someone is in final charge. That a company or organisation runs a server which will decide whose changes are allowed, and whose are not.
How Git Works
Git is distributed, meaning that every computer where git is installed has its own master copy.
So every computer has a full history of any git project (aka. repository or ‘repo’). Indeed, you don’t have to synchronise your repo with any other computer or server at all! 1
In order to make this useful, you need ways to synchronise changes between computers that all think they’re right.
GitHub
GitHub is nothing special to Git, just another Git server with which to negotiate changes. Do not think of GitHub as the ‘master’ copy. There isn’t one.
There are, however, upstream and remote repositories.
An ‘upstream’ repository is where there’s a ‘gatekeeper’: e.g. the people who run PySAL have a repo that is considered the ‘gatekeeper’ for PySAL.
A remote repository is any repository with which your copy synchronises. So the remote repository can be ‘upstream’ or it can just be another computer you run, or you GitHub account.
Using Git
Getting Started
| Term | Means |
|---|---|
| Repository (Repo) | A project or achive stored in Git. |
| init | To create a new repo on your computer. |
| clone | To make a full copy of a repo somewhere else. |
This creates a local repo that is unsynchronised with anything else:
Whereas this creates a local clone that is fully synchronised with GitHub:
Working on a File
| Term | Means |
|---|---|
add |
Add a file to a repo. |
mv |
Move/Rename a file in a repo. |
rm |
Remove a file from a repo. |
For example:
cd test # Back into the new Repo
touch README.md # Create empty file called README.md
git add README.md # Add it to the repository
git mv README.md fileA.md # Rename it (move it)
git rm fileA.md # Remove it... which is an Error!This produces:
error: the following file has changes staged in the index:
fileA.md
(use --cached to keep the file, or -f to force removal)This is telling you that you can force remove (git rm -f fileA.md) if you really want, but you’d probably be better off commiting the changes that have been ‘staged’… more on this in a second!
Also: no one else knows about these changes yet!
Looking at the History
| Term | Means |
|---|---|
diff |
Show changes between commits. |
status |
Show status of files in repo. |
log |
Show history of commits. |
For example:
This produces:
On branch master
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: fileA.mdSo again, git is giving us hints as to the options: ‘changes to be committed’ vs. ‘unstage’ the changes. We can also see what files are to be committed (i.e. have changed).
Working on a Project or File
| Term | Means |
|---|---|
commit |
To record changes to the repo. |
branch |
Create or delete branches. |
checkout |
Jump to a different branch. |
For example:
You should see:
[master (root-commit) e7a0b25] Added and then renamed the README Markdown file.
1 file changed, 0 insertions(+), 0 deletions(-)
create mode 100644 fileA.md
# ... and then this:
On branch master
nothing to commit, working tree cleanMake a note of the number after ‘root-commit’!
Recovery
git rm fileA.md
git status
git commit -m "Removed file."
ls
git checkout <number you wrote down earlier>
ls So every operation on a file is recorded in the repository: adding, renaming, deleting, and so on. And we can roll back any change at any time. For plain-text files (such as Markdown, Python and R scripts) these changes are recorded at the level of each line of code: so you can jump around through your entire history of a project and trace exactly when and what changes you (or anyone else) made.
Collaborating on a Project
| Term | Means |
|---|---|
pull |
To request changes on a repo from another computer. |
push |
To send changes on a repo to another computer. |
For example:
git push
All changes are local until pushed.
If you forget to push your changes (e.g. to GitHub) then you are not backed up if your computer dies.
This is not easy2
A Dropbox Analogy
- Think of JupyterLab as being like Word or Excel: an application that allows you to read/write/edit notebook files.
- Think of GitHub as being like Dropbox: a place somewhere in the cloud that files on your home machine can be backed up.
But Dropbox doesn’t have the .gitignore file!
So why use Git? It gives you a full history of everything for as far back as the project goes and much finer-grained control over files and syncrhonisation than Dropbox. If you don’t add a file to git it can live quite happily in your git repository but will never synchronise.
Like Dropbox, GitHub offers a lot of ‘value added’ featuers (like simple text editing) on top of the basic service of ‘storing files’.
Dropbox will automatically back up anything that you put in your special Dropbox folder. Git will only back up the things that you tell it to back up, even if they are in the Git folder!
A Rock-Climbing Analogy
A Note on Workflow
So your workflow should be:
- Save edits to Jupyter notebook.
- Run
git add <filename.ipynb>to record changes to the notebook (obviously replace<filename.ipynb>completely with the notebook filename. - Run
git commit -m "Adding notes based on lecture"(or whatever message is appropriate:-mmeans ‘message’). - Then run
git pushto push the changes to GitHub.
If any of those commands indicate that there are no changes being recorded/pushed then it might be that you’re not editing the file that you think you are (this happens to me!).
On the GitHub web site you may need to force reload the view of the repository: Shift + Reload button usually does it in most browsers. You may also need to wait 5 to 10 seconds for the changes to become ‘visible’ before reloading. It’s not quite instantaeous.
Resources
- Understanding Git (Part 1) – Explain it Like I’m Five
- Trying Git
- Visualising Git
- Git Novice
- Git Cheat Sheet: Commands and Best Practices
- Andy’s R-focussed Tutorial
I now have everything in Git repos: articles, research, presentations, modules… the uses are basically endless once you start using Markdown heavily (even if you don’t do much coding).